feat(framework): Phase 3 PR 1 — audit prompt templates + output schema by montfort · Pull Request #85 · StrangeDaysTech/devtrail

montfort · 2026-05-03T06:05:35Z

Summary

First of 6 PRs implementing Phase 3 (multi-model external audit) + the open frictions F2/F5/F7. Framework-only — no CLI code yet.

What's added

dist/.devtrail/audit-prompts/auditor-primary.md — prompt template for the primary auditor.
dist/.devtrail/audit-prompts/auditor-secondary.md — prompt template for the secondary auditor (different model family).
dist/.devtrail/audit-prompts/calibrator-reconciler.md — prompt template for the third-tier calibrator that reconciles the two auditor outputs.
dist/.devtrail/schemas/audit-output.schema.v0.json — JSON Schema Draft 2020-12 with oneOf discriminator on audit_role (auditor outputs vs calibrator output).

Architectural decision A1: orchestration-only

Phase 3 v0 is orchestration-only, not an HTTP-API client. The CLI prepares and persists prompts, awaits the operator's responses, validates outputs against the schema, integrates findings into the Charter telemetry — but does not invoke any LLM API directly.

Rationale:

Implementing 3 HTTP clients (OpenAI / Google / Anthropic) is 1-2 weeks + perpetual maintenance when APIs change. Premature for an experimental v0 schema.
Sentinel's empirical pattern (the 6-cycle dual-audit experiment that motivated Phase 3) already uses this human-in-the-loop shape via /plan-audit skills. The CLI's value-add is the canon (prompt shape + output schema + telemetry integration), not the API call.
Closes RFC RFC: Phase 3 audit visibility — persist resolved prompts + standardize auditor handoff #82 (audit visibility) by design — the prompt-resolution and the auditor's response are both files on disk, version-controlled, inspectable.
Aligns with principle fix: improve explore TUI navigation, rendering, and usability #10 (honesty about what the tool does not do): "no LLM gateway, no model evaluation".

Schema design

oneOf discriminator on audit_role: three fixed roles, not arbitrary N.
findings_by_category enum (hallucination | implementation_gap | real_debt | false_positive) is the same vocabulary used by external_audit in charter-telemetry.schema.v0.json. The audit cycle output integrates directly into Charter telemetry at close.
Every output declares prompt_used: <relative path>, satisfying RFC RFC: Phase 3 audit visibility — persist resolved prompts + standardize auditor handoff #82's requirement that the prompt path be discoverable from the output.

Prompt design

Primary and secondary prompts are structurally identical. The heterogeneity signal lives in the auditor MODEL (different family per §5.2), not in different prompts. A/B-testing prompt phrasings is forward-looking; v0 keeps them symmetric for clean comparability.
Calibrator prompt asks for status assignment (agreed | disputed | unique_primary | unique_secondary | rejected) per finding. Status counts cross-check against body section count.
All three include explicit categorization + discipline rules ("don't fabricate findings", "no external sources beyond the prompt").

Test plan

JSON Schema is valid (parses with Python json module).
No dist-manifest.yml change needed — .devtrail/ is already declared recursively.
PR 2 will validate the schema against real auditor outputs in integration tests.
PR 6 will smoke-test a full audit cycle in a tempdir.

🤖 Generated with Claude Code

First of 6 PRs implementing Phase 3 (multi-model external audit) + the open frictions F2/F5/F7. Framework-only — no CLI code yet. Artifacts (all under dist/.devtrail/, auto-distributed via the existing recursive manifest pattern): - audit-prompts/auditor-primary.md - audit-prompts/auditor-secondary.md - audit-prompts/calibrator-reconciler.md - schemas/audit-output.schema.v0.json Architectural decision A1 (per the Phase 3 plan): Phase 3 v0 is ORCHESTRATION-ONLY, not an HTTP-API client. The CLI prepares and persists prompts, awaits the operator's responses, validates outputs against the schema, integrates findings into the Charter telemetry — but does NOT invoke any LLM API directly. Adopters paste the resolved prompts into their auditor of choice (Copilot, Gemini, Claude, etc.), save responses to the canonical paths, and the CLI consolidates. Rationale for orchestration-only: - Implementing 3 HTTP clients (OpenAI / Google / Anthropic) is 1-2 weeks of work + perpetual maintenance when APIs change. For an EXPERIMENTAL v0 schema, that investment is premature. - Sentinel's empirical pattern (the 6-cycle dual-audit experiment that motivated Phase 3) ALREADY uses this human-in-the-loop shape via /plan-audit skills. The CLI's value-add is the canon (prompt shape + output schema + telemetry integration), not the API call. - Closes RFC #82 (audit visibility) by design — the prompt-resolution and the auditor's response are both files on disk, version-controlled, inspectable, and reproducible by hand if the API call fails. - Aligns with principle #10 (honesty about what the tool does NOT do): "no LLM gateway, no model evaluation". Schema design: - audit-output.schema.v0.json uses oneOf to distinguish auditor outputs (primary/secondary, fresh findings) from calibrator outputs (reconciliation across the two). The `audit_role` field is the discriminator — three fixed roles, not arbitrary N. - findings_by_category enum (hallucination | implementation_gap | real_debt | false_positive) is the same vocabulary used by the external_audit array in charter-telemetry.schema.v0.json. The audit cycle output integrates directly into Charter telemetry at close. - Every output declares prompt_used: <relative path>, satisfying RFC #82's requirement that the prompt path be discoverable from the output. Prompt design: - Primary and secondary prompts are STRUCTURALLY IDENTICAL. The heterogeneity signal lives in the auditor MODEL (different family per §5.2), not in different prompts. A/B-testing prompt phrasings is forward-looking; v0 keeps them symmetric for clean comparability. - Calibrator prompt assumes both auditor outputs as context and asks for status assignment (agreed | disputed | unique_primary | unique_secondary | rejected) per finding. Status counts cross-check against body section count — the schema enforces consistency. - All three prompts include explicit categorization rules + discipline rules ("don't fabricate findings", "no external sources beyond the prompt"). The rules are duplicated across the three so the auditor doesn't need to consult external documentation. What's NOT in this PR: - No CLI code yet — the `devtrail charter audit` command lands in PR 2. - No heterogeneity validation (`--implementer-family` enforcement) — v1. - No invocation of LLM APIs — orchestration-only by design. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n) (#86) Second of 6 PRs implementing Phase 3 + open frictions. The CLI command that orchestrates the dual-audit + calibrator cycle, using the prompt templates and output schema shipped in PR 1 (#85). Architecture A1 (orchestration-only) means the CLI does NOT invoke LLM APIs. The operator pastes resolved prompts into their auditor of choice (Copilot, Gemini, Claude, etc.) and saves responses to canonical paths under audit/charters/<CHARTER-ID>/. The CLI's value is structure (prompt resolution + output schema validation + telemetry-ready YAML), not invocation. Three steps, each invokable independently: $ devtrail charter audit CHARTER-01 Step 1/3: PREPARE Resolves auditor-primary.prompt.md and auditor-secondary.prompt.md against the Charter content + git diff + originating AILOGs, writes to audit/charters/CHARTER-01/prompts/. $ devtrail charter audit CHARTER-01 --calibrate Step 2/3: CALIBRATE Validates the two auditor responses against audit-output.schema.v0.json, resolves the calibrator-reconciler prompt with both responses embedded as context. $ devtrail charter audit CHARTER-01 --finalize Step 3/3: FINALIZE Validates all 3 outputs (auditor-primary + auditor-secondary + calibrator), prints a YAML-formatted external_audit array block ready to paste into the Charter telemetry, and points to the calibrator's reconciliation summary for outcome.scope_change_notes. Each step is a filesystem mutation. Files persist between steps — operator can run prepare, walk away, come back days later, run calibrate. Each step prints clear next-action guidance pointing to the exact paths involved. Per RFC #82 the resolved prompt is persisted BEFORE any external action. The schema's prompt_used field cites which prompt template was used; the calibrator can verify provenance. Module shape: - src/audit_schema.rs: jsonschema wrapper with oneOf-aware error formatting, mirroring telemetry_schema.rs and charter_schema.rs. - src/commands/charter/audit.rs: 3-step run dispatch, template resolution with placeholder substitution, frontmatter parsing for auditor summaries, external_audit YAML rendering. Placeholders supported in templates: {{charter_id}}, {{charter_title}}, {{charter_path}}, {{charter_content}}, {{git_range}}, {{git_diff}}, {{ailog_paths}}, {{ailog_contents}}, {{audit_role}}, {{schema_path}}, {{auditor_primary_findings}}, {{auditor_secondary_findings}}. Unknown placeholders are left as literals (no surprise mutations). Tests: - 5 unit tests in src/audit_schema.rs (auditor vs calibrator oneOf discriminator, charter_id pattern, auditors_reconciled minItems). - 5 unit tests in src/commands/charter/audit.rs (canonical_id, template substitution, frontmatter parsing, AuditorSummary). - 7 integration tests in cli/tests/charter_audit_test.rs covering all three steps + error paths (devtrail-not-installed, unknown charter, calibrate-without-auditor-outputs, schema validation failure, full cycle, mutually-exclusive flags). 400/400 tests pass. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

montfort merged commit 7f15541 into main May 3, 2026

montfort deleted the feat/phase3-pr1-audit-artifacts branch May 3, 2026 06:05

montfort mentioned this pull request May 3, 2026

feat(cli): Phase 3 PR 2 — devtrail charter audit (3-step orchestration) #86

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(framework): Phase 3 PR 1 — audit prompt templates + output schema#85

feat(framework): Phase 3 PR 1 — audit prompt templates + output schema#85
montfort merged 1 commit intomainfrom
feat/phase3-pr1-audit-artifacts

montfort commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

montfort commented May 3, 2026

Summary

What's added

Architectural decision A1: orchestration-only

Schema design

Prompt design

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant